Intro to Data Analysis

Introduction to Data Analysis

  • UNIT OF ANALYSIS
  • POPULATION
  • SAMPLE
  • N & n
  • DESCRIPTIVE STATISTICS
  • INFERENTIAL STATISTICS
  • TIDY DATA
  • VARIABLES
  • DICHOTOMOUS
  • NOMINAL
  • ORDINAL
  • INTERVAL-RATIO

Unit of Analysis

Who or what is being studied?


POPULATION

All units of analysis (people, institutions, groups, etc.) in which the researcher is interested.


SAMPLE

A subset of people (or institutions, groups, etc.) selected from a population.

DESCRIPTIVE STATISTICS

Procedures that help us organize and describe data collected from a sample or population.


INFERENTIAL STATISTICS

Making underlying predictions or inferences about a population using observations and analyses from a sample.

Tidy Data

VARIABLES

Any factor, trait, or condition that can exist in differing amounts or types.

Measurement Levels

Dichotomous (aka binary)
A variable with only two categories.

Nominal
A variable made up of categories that cannot be ordered

Ordinal
A variable made up of ranked categories, with no systematic or measurable numeric difference between the categories.

Continuous (aka interval-ratio)
A variable with categories that are ordered and expressed in the same units.

Learning to Code

Technology is fun!


You’re not just learning the statistical concepts in this course, but how to produce the statistics. Analyzing data requires learning to use new technology.


Learning statistical software to analyze data can be really fun. You get to learn about real world social problems!

Technology is challenging!

It can be frustrating.


When it feels like the technology is preventing you from getting to the course content, take a deep breath, and remember that building your technology skills is part of this course.

Why am I making you learn something so frustrating?


Calculating the statistics by hand quickly gets cumbersome, time consuming, and difficult.


Good social science is built on replication.

Grappling

Learning to use statistical software necessitates grappling.

Grappling implies trying even before you fail the first time.


It’s thinking, “First, I’ll work with it independently. Okay, I’m really not understanding it. Let me go back to my notes. Okay, I have solved for the first part of it. Now I have the second part of it. Okay, I got the question wrong; let me try again. Maybe I can ask my peer now.”


Grappling is working hard to make sure you understand the problem fully, and then using every resource at your fingertips to solve it.”

Most statistical analyses happen not because the person is a math genius, but because they persisted through the minefield of technical issues by being excellent problem-solvers.

Coding is mostly Googling


It is a misconception that the best statistical analysts sit down at their computers and type code from memory.


Much of process of coding is copying code from somewhere else and modifying it to fit your particular situation.

When you get stuck…

…there are many options to get unstuck:

  • Review the slides. Pay very close attention to small details.
  • Try something else to see if you get a new error.
  • Use Google to search for possible answers or new explanations.
  • Watch a help video on YouTube on the topic.
  • Re-start your web-browser or device.
  • Try another web-browser or device.
  • Ask a peer. Or an advanced student.
  • Start or join a weekly study group.
  • Post the question on the class discussion board.
  • Email your TA

Help in this class

Before requesting an individual meeting with a TA:

  • Spend a sufficient amount of time working on it on your own.
  • Ask two of your peers.
  • Post the question on the class discussion board.

When emailing:

  • Explain what troubleshooting steps you’ve already taken.
  • Report who you’ve already asked for help.

Create a trail!

Create a reproducible example


Goal: Make someone else feel your pain!

  • Assume others know nothing about your issue. 
  • Describe your steps to create the problem so that someone else can replicate it. 
  • This means clearly describing the issue and the steps you’ve already taken to solve it. 

Good etiquette

Search for answers before posting your question.
Let me google that for you. 🙄 

Describe the problem.
“It doesn’t work” isn’t descriptive enough. 

Describe your environment.
What operating system are you using? Which R version? What packages? Dataset?

Describe the solution.
Confirm if a solution offered works. Or, if you solve it on your own, post how you solved it.